Search CORE

19 research outputs found

Effect of missing data on multitask prediction methods

Author: A Anighoro
A Mayr
A Tropsha
Antonio de la Vega de León
AP Bento
B Chen
B Ramsundar
Beining Chen
D Fourches
D Rogers
D Weininger
G Harper
J Ma
J Simm
JG Moffat
KY Helal
L Breiman
M Glick
MR Berthold
S Kim
S Knapp
SL Kinnings
SM Wilhelm
T Unterthiner
TWH Backman
Valerie J. Gillet
Y LeCun
Y Wang
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2018
Field of study

There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises

Crossref

Directory of Open Access Journals

White Rose Research Online

Photoacclimation strategies in northeastern Atlantic seagrasses: Integrating responses across plant organizational levels

Author: A Alexandre
A Larbi
AH Cunha
B Olesen
B Villazán
BJ Longstaff
C Casper-Lindley
CA Ochieng
CB los Santos De
CM Duarte
D Curiel
DJ Shafer
DJ Watson
E Dattolo
EG Abal
EH Murchie
EH Murchie
F Short
F Tuya
FG Brun
FG Brun
G Peralta
G Peralta
G Peralta
G Procaccini
HY Yamamoto
I Olivé
I Olivé
J Dalla Via
J Kuo
J las Rivas De
J Silva
J Silva
J Silva
JM Ruiz
JM Sandoval-Gil
L Mazzella
L Mazzella
M Lichtenberg
M-C Buia
ME Cummings
MJ Durako
MZ Haznedaroğlu
N Marbá
N Schubert
NM Cayabyab
O Kooten van
P Mulo
PB Reich
PJ Ralph
RM Vásquez-Elizondo
S Cabaço
S Enríquez
S Enríquez
S Enríquez
S Enríquez
S Enríquez
S Matsuki
SJ Campbell
SP Dawson
SR Park
TP O’Brien
TWH Backman
V Macic
W-T Li
WC Dennison
WC Dennison
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2018
Field of study

Seagrasses live in highly variable light environments and adjust to these variations by expressing acclimatory responses at different plant organizational levels (meadow, shoot, leaf and chloroplast level). Yet, comparative studies, to identify species' strategies, and integration of the relative importance of photoacclimatory adjustments at different levels are still missing. The variation in photoacclimatory responses at the chloroplast and leaf level were studied along individual leaves of Cymodocea nodosa, Zostera marina and Z. noltei, including measurements of variable chlorophyll fluorescence, photosynthesis, photoprotective capacities, non-photochemical quenching and D1-protein repair, and assessments of variation in leaf anatomy and chloroplast distribution. Our results show that the slower-growing C. nodosa expressed rather limited physiological and biochemical adjustments in response to light availability, while both species of faster-growing Zostera showed high variability along the leaves. In contrast, the inverse pattern was found for leaf anatomical adjustments in response to light availability, which were more pronounced in C. nodosa. This integrative plant organizational level approach shows that seagrasses differ in their photoacclimatory strategies and that these are linked to the species' life history strategies, information that will be critical for predicting the responses of seagrasses to disturbances and to accordingly develop adequate management strategies.Fundacao para a Ciencia e Tecnologia (FCT), Portugal [PTDC/MAR-EST/4257/2014

Crossref

Directory of Open Access Journals

Sapientia

Open Babel: An open chemical toolbox

Author: A Amini
A Andronico
A Bender
A Gakh
A Karwath
A Maunz
A Maunz
A Poater
A Rappe
AA Gakh
AD Hill
B-b Yan
BD McKay
C Helma
C Reynès
Chris Morley
CR Jacob
Craig A James
CW Bullock
D Filimonov
D Lagorce
D Lagorce
D Weininger
DC Bas
DC Lonie
DR Koes
F Fontaine
Geoffrey R Hutchison
GL Holliday
HL Morgan
I Wallach
I Wallach
IV Filippov
IV Tetko
J Ahmed
J Ahmed
J Kazius
J Myers
J Wang
J Wang
JH Chen
JJ Langham
JL Melville
JL Sharman
K Fogel
K Martin
L Fabian
L Liu
L Schietgat
M Brüstle
M Buehler
M Dehmer
M Konyk
M Krier
M Kuhn
MA Meineke
MA Miteva
Michael Banck
MJ Gómez
N O'Boyle
N Zonta
NM O'Boyle
NM O'Boyle
Noel M O'Boyle
O Sperandio
P Lind
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Rydberg
P Tosco
P Tosco
R Esposito
RA Bauer
RA Bauer
RS Armen
S Arbor
S Ingsriswang
SV Trepalin
T Cheng
T Halgren
T Halgren
T Halgren
T Halgren
T Halgren
T Kogej
T Pencheva
Tim Vandermeersch
TWH Backman
U Schmidt
VV Mihaleva
William H Green
X Jiang
X Wang
YD Paila
Z Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Irish Universities

PubMed Central

Cork Open Research Archive

The distribution of standard deviations applied to high throughput screening

Author: A Birmingham
A Gaulton
A Gaulton
AM Clark
B Mazoure
B Néron
C Aldrich
C Alland
D Demirbas
D Lagorce
DJ Rogers
F Svensson
I Caraus
I Muegge
ID Shterev
J-H Zhang
JB Baell
JB Baell
JJ Yang
JL Dahlin
KJ Wierenga
KP Seiler
M Butkiewicz
MF Schilling
MK Gilson
N Malo
N Ruiz
O Roche
P Axerio-Cilies
P Che
QS Hanley
R Holland
S Bandyopadhyay
S Jasial
S Kim
S Okuda
S Sarkar
S Tuna
SJ Capuzzi
T Girke
T Tony Cai
TWH Backman
V Gupta
W-L Chen
Y Cao
Z Eisler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

High throughput screening (HTS) assesses compound libraries for “activity” using target assays. A subset of HTS data contains a large number of sample measurements replicated a small number of times providing an opportunity to introduce the distribution of standard deviations (DSD). Applying the DSD to some HTS data sets revealed signs of bias in some of the data and discovered a sub-population of compounds exhibiting high variability which may be difficult to screen. In the data examined, 21% of 1189 such compounds were pan-assay interference compounds. This proportion reached 57% for the most closely related compounds within the sub-population. Using the DSD, large HTS data sets can be modelled in many cases as two distributions: a large group of nearly normally distributed “inactive” compounds and a residual distribution of “active” compounds. The latter were not normally distributed, overlapped inactive distributions – on both sides –, and were larger than typically assumed. As such, a large number of compounds are being misclassified as “inactive” or are invisible to current methods which could become the next generation of drugs. Although applied here to HTS, it is applicable to data sets with a large number of samples measured a small number of times

Crossref

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals